منابع مشابه
Parallel Texts Extraction from Multimodal Comparable Corpora
Statistical machine translation (SMT) systems depend on the availability of domain-specific bilingual parallel text. However parallel corpora are a limited resource and they are often not available for some domains or language pairs. We analyze the feasibility of extracting parallel sentences from multimodal comparable corpora. This work extends the use of comparable corpora by using audio sour...
متن کاملMetaphor Identification in Large Texts Corpora
Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of ...
متن کاملBuilding Corpora for the Philological Study of Swiss Legal Texts
We describe the construction of two corpora in the domain of Swiss legal texts: The DS21 corpus is based on the Collection of Swiss Law Sources and contains historical legal texts from the early Middle Ages up to 1798; the Swiss Legislation Corpus (SLC) is based on the Classified Compilation of Swiss Federal Legislation and contains all current Swiss federal laws. The paper summarizes the key p...
متن کاملBuilding Corpora of Technical Texts: Approaches and Tools
Building corpora of technical texts in Science, Technology, Engineering, and Mathematics (STEM) domain has its specific needs, especially the handling of mathematical formulae. In particular, there is no widely accepted format to represent and handle math. We present an approach based on multiple representations of mathematical formulae that has been used for math retrieval, similarity and clus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada
سال: 2002
ISSN: 0102-4450
DOI: 10.1590/s0102-44502002000200004